Towards a Topic Driven Access to Full Text Documents
نویسندگان
چکیده
We address the issue of providing a topic driven access to full text documents. The methodology we propose is a combination of topic segmentation and information retrieval techniques. By segmenting the text into topic driven segments, we obtain small and coherent documents that can be used as a basis for the automatic generation of links, and as a visualization aid for the reader who is presented with a focused and restricted text snippet. In presence of a concept hierarchy (ontology), the information retrieval step would connect the obtained segments to concepts in the ontology. In this paper we concentrate on the text segmentation phase: we describe our apporach, discuss some related issues and report on preliminary results.
منابع مشابه
Towards Topic Driven Access to Full Text Documents
We address the issue of providing topic driven access to full text documents. The methodology we propose is a combination of topic segmentation and information retrieval techniques. By segmenting the text into topic driven segments, we obtain small and coherent documents that can be used in two ways: as a basis for automatically generating hypertext links, and as a visualization aid for the rea...
متن کاملA review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملFinding Topic-centric Identified Experts based on Full Text Analysis
This paper shows a method for finding topic-centric experts from open access metadata and full text documents. Topic-centric information including experts is served on OntoFrame, which is a Semantic Web-based academic research information service supporting R&D activities. URI schemebased OntoFrame provides three entity pages: topic, person, and event. ‘Persons by Topic’ in topic page lists up ...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملSAVVY SEARCHING Open access to scholarly full-text documents
Purpose – The purpose of this article is to discuss open access to scholarly full-text documents. Design/methodology/approach – Discusses open access to scholarly full-text documents. Findings – The paper shows that while open access archives are good for the majority, for publishers, editors and authors, open access articles can substantially increase their impact, and the impact factor for th...
متن کامل